-
Notifications
You must be signed in to change notification settings - Fork 25.6k
ES|QL - Full text functions accept null as field parameter #137430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
ES|QL - Full text functions accept null as field parameter #137430
Conversation
… and add the FoldNull rule to LocalLogicalPlanOptimizer
…down to an index that doesn't contain the field
|
Hi @carlosdelest, I've created a changelog YAML for you. |
…tions-accept-null' into bugfix/esql-full-text-functions-accept-null
…-functions-accept-null # Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/LocalLogicalPlanOptimizer.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LocalLogicalPlanOptimizerTests.java
| @@ -300,39 +286,7 @@ public final void writeTo(StreamOutput out) throws IOException { | |||
|
|
|||
| @Override | |||
| protected TypeResolution resolveParams() { | |||
| return resolveField().and(resolveQuery()) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactoring - all of these are moved up to SingleFieldFullTextFunction
| } | ||
|
|
||
| @Override | ||
| public boolean foldable() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what changed on this PR besides all the refactoring. When the FTF field is null, the FTF is foldable to NULL, and nullable.
| new InferNonNullAggConstraint(), | ||
| new ReplaceDateTruncBucketWithRoundTo() | ||
| new ReplaceDateTruncBucketWithRoundTo(), | ||
| new FoldNull() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alex-spies I added foldNull to local logical optimization, so full text functions can be folded to null when null has been replaced. Does it look OK to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be beneficial, yes. Not sure why we didn't have it earlier.
Do you think you could please add one/some LocalLogicalPlanOptimizerTests with maybe some "easy" other function(s) (string or math?) to check that they locally now get folded too. This should add a tiny bit of performance to other functions too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the FoldNull rule is in the localOperators batch, that is applied after the localRewrites batch. Is it too late to wait for it to be executed in localOperators? I wonder why we need a duplicate of it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see - I didn't understand that this rule was already used as part of localOperators(). Removing it, thanks @fang-xing-esql !
| """); | ||
|
|
||
| // Limit[1000[INTEGER],false,false] | ||
| Limit limit = as(plan, Limit.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fang-xing-esql , now FTFs can be pushed down - when a field does not exist in the index, it will become NULL instead.
Does this make sense from the union all perspective?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found a potential issue here and I'll provide a reproducer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The query in the test failed with VerificationException because the full text function does not apply to one of the subquery(branch), the field referenced in the full text function is not an index field, it doesn't throw VerificationException in this PR, however I run into this error when the query is executed. Shall we still throw VerificationException for this kind of situations? Or prune the entire branch when the field does not exist in that branch, a related comment is here.
+ curl -u elastic:password -X POST 'localhost:9200/_query?format=txt&pretty' -H 'Content-Type: application/json' '-d
{
"query": "from sample_data_1, (from books) metadata _score | WHERE author:\"Bradbury\" | sort _score desc, author"
}
'
{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "Unrecognized class type class org.elasticsearch.xpack.esql.core.expression.Literal"
}
],
"type" : "illegal_argument_exception",
"reason" : "Unrecognized class type class org.elasticsearch.xpack.esql.core.expression.Literal"
},
"status" : 400
}
The mapping and data a like below.
curl -u elastic:password -X POST "localhost:9200/books/_doc?pretty" -H 'Content-Type: application/json' -d'
{"name": "Snow Crash", "author": "Neal Stephenson", "release_date": "1992-06-01", "page_count": 470, "*name": "*Snow Crash"}
'
curl -u elastic:password -X POST "localhost:9200/_bulk?refresh&pretty" -H 'Content-Type: application/json' -d'
{ "index" : { "_index" : "books" } }
{"name": "Revelation Space", "author": "Alastair Reynolds", "release_date": "2000-03-15", "page_count": 585, "*name": "*Revelation Space"}
{ "index" : { "_index" : "books" } }
{"name": "1984", "author": "George Orwell", "release_date": "1985-06-01", "page_count": 328, "*name": "*1984"}
{ "index" : { "_index" : "books" } }
{"name": "Fahrenheit 451", "author": "Ray Bradbury", "release_date": "1953-10-15", "page_count": 227, "*name": "*Fahrenheit 451"}
{ "index" : { "_index" : "books" } }
{"name": "Brave New World", "author": "Aldous Huxley", "release_date": "1932-06-01", "page_count": 268, "*name": "*Brave New World"}
{ "index" : { "_index" : "books" } }
{"name": "The Handmaids Tale", "author": "Margaret Atwood", "release_date": "1985-06-01", "page_count": 311, "*name": "*The Handmaids Tale"}
'
curl -u elastic:password -X PUT "localhost:9200/sample_data_1?pretty" -H 'Content-Type: application/json' -d'
{
"settings" : {"mode" : "lookup"},
"mappings": {
"properties": {
"client.ip": {"type": "ip"},
"message": {"type": "text"},
"test_date": {"type": "date"},
"event.duration": {"type": "long"},
"mv": {"type": "long"}
}
}
}
'
curl -X PUT "localhost:9200/sample_data_1/_bulk?refresh&pretty" -H 'Content-Type: application/json' -d'
{"index": {}}
{"@timestamp": "2023-10-23T12:15:03.360Z", "client.ip": "172.21.2.162", "message": "Connected to 10.1.0.3", "event.duration": 3450233, "test_date": "1985-11-21T00:00:00Z", "mv": [1.0, 2.0]}
{"index": {}}
{"@timestamp": "2023-10-23T12:27:28.948Z", "client.ip": "172.21.2.113", "message": "Connected to 10.1.0.2", "event.duration": 2764889, "test_date": "1985-11-22T00:00:00Z", "mv": [1.0, 2.0]}
{"index": {}}
{"@timestamp": "2023-10-23T13:33:34.937Z", "client.ip": "172.21.0.5", "message": "Disconnected", "event.duration": 1232382, "test_date": "1985-11-23T00:00:00Z", "mv": [1.0, 2.0]}
{"index": {}}
{"@timestamp": "2023-10-23T13:51:54.732Z", "client.ip": "172.21.3.15", "message": "Connection error", "event.duration": 725448, "test_date": "1985-11-24T00:00:00Z", "mv": [1.0, 2.0]}
{"index": {}}
{"@timestamp": "2023-10-23T13:52:55.015Z", "client.ip": "172.21.3.15", "message": "Connection error", "event.duration": 8268153, "test_date": "1985-11-25T00:00:00Z", "mv": [1.0, 2.0]}
{"index": {}}
{"@timestamp": "2023-10-23T13:53:55.832Z", "client.ip": "172.21.3.15", "message": "Connection error", "event.duration": 5033755, "test_date": "1985-11-26T00:00:00Z", "mv": [1.0, 2.0]}
{"index": {}}
{"@timestamp": "2023-10-23T13:55:01.543Z", "client.ip": "172.21.3.15", "message": "Connected to 10.1.0.1", "event.duration": 1756467, "test_date": "1985-11-27T00:00:00Z", "mv": [1.0, 2.0]}
'
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
|
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
|
Pinging @elastic/search-relevance (Team:Search - Relevance) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good to me, but will defer to someone with more ES|QL experience to 👍 😁
| public void testFullTextFunctionCannotBePushedDownPastUnionAll() { | ||
| assumeTrue("Requires subquery in FROM command support", EsqlCapabilities.Cap.SUBQUERY_IN_FROM_COMMAND.isEnabled()); | ||
| VerificationException e = expectThrows(VerificationException.class, () -> planSubquery(""" | ||
| var plan = planSubquery(""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to specify a new capability for this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't, as the MATCH function is already released (so it's not conditionally added for release / snapshots) and this test is not executed for BwC purposes.
| * - Field verification | ||
| * - Serialization patterns | ||
| */ | ||
| public abstract class SingleFieldFullTextFunction extends FullTextFunction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice refactoring!
|
@elasticsearchmachine run elasticsearch-ci/bwc-snapshots-part3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments, but after those it should look good.
I think some simple CSV-spec tests would be good to have too, just to check that they work with literals, centrally too. (MATCH(null, ..), ..| EVAL x = null | WHERE MATCH(x, ...)).
Would be great to wait for @fang-xing-esql check too.
...a/org/elasticsearch/xpack/esql/expression/function/fulltext/SingleFieldFullTextFunction.java
Outdated
Show resolved
Hide resolved
...a/org/elasticsearch/xpack/esql/expression/function/fulltext/SingleFieldFullTextFunction.java
Outdated
Show resolved
Hide resolved
...a/org/elasticsearch/xpack/esql/expression/function/fulltext/SingleFieldFullTextFunction.java
Outdated
Show resolved
Hide resolved
| protected FieldAttribute fieldAsFieldAttribute(Expression field) { | ||
| Expression fieldExpression = field; | ||
| // Field may be converted to other data type (field_name :: data_type), so we need to check the original field | ||
| if (fieldExpression instanceof AbstractConvertFunction convertFunction) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A somewhat lateral note: we could have an unpacking loop here, for the (more likely automated query generation) case of nested conversions, like: field::INTEGER::LONG.
...a/org/elasticsearch/xpack/esql/expression/function/fulltext/SingleFieldFullTextFunction.java
Outdated
Show resolved
Hide resolved
...a/org/elasticsearch/xpack/esql/expression/function/fulltext/SingleFieldFullTextFunction.java
Outdated
Show resolved
Hide resolved
...esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/fulltext/MultiMatch.java
Outdated
Show resolved
Hide resolved
...gin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/LocalLogicalPlanOptimizer.java
Outdated
Show resolved
Hide resolved
| AbstractMatchFullTextFunctionTests.addQueryAsStringTestCases(suppliers); | ||
| AbstractMatchFullTextFunctionTests.addStringTestCases(suppliers); | ||
| return suppliers; | ||
| return addNullFieldTestCases(suppliers); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these aren't added to all functions, like MatchPhraseTests (possibly others, Knn?), right?
Raising this since MatchPhrase.FIELD_DATA_TYPES doesn't include NULL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call! Done in 845cd35
| new InferNonNullAggConstraint(), | ||
| new ReplaceDateTruncBucketWithRoundTo() | ||
| new ReplaceDateTruncBucketWithRoundTo(), | ||
| new FoldNull() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be beneficial, yes. Not sure why we didn't have it earlier.
Do you think you could please add one/some LocalLogicalPlanOptimizerTests with maybe some "easy" other function(s) (string or math?) to check that they locally now get folded too. This should add a tiny bit of performance to other functions too.
Co-authored-by: Bogdan Pintea <[email protected]>
Co-authored-by: Bogdan Pintea <[email protected]>
Co-authored-by: Bogdan Pintea <[email protected]>
…tions-accept-null' into bugfix/esql-full-text-functions-accept-null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @carlosdelest, and for taking into account subqueries as well! Agreed with @bogdan, it will be great to add some additional tests(verify the logical plan and some CsvTests to validate query results) that have these full text functions with null as input field. If the full text functions are used in filters, the PruneFilters rule might be able to replace its children plans with a LocalRelation, that is a sign that the filter on full text function + null fields is applied successfully.
| @Override | ||
| public Object fold(FoldContext ctx) { | ||
| // We only fold when the field is null (it's not present in the mapping), so we return null | ||
| return Literal.NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should just return a null here, instead of wrapping a Literal around it? A lot of implementations of fold return a plain null.
Closes #136608
Full text functions that take a field as a parameter should allow null as a field parameter. This is necessary as the field may not be present on an index mapping, and the local optimizer replaces it by null.
We allow null as a field in FullTextFunctions, meaning that they are nullable and will be replaced by
NULL.This PR also refactors full text functions that depend on a single field (
MATCH,MATCH_PHRASE,KNN) to use a common superclass (SingleFieldFullTextFunction) that contains the common field logic.